My portfolio: an introduction

Column

What is my corpus?

When the course started, we were asked to choose a corpus. Important was to find something that allowed for meaningful comparisons and contrasts, so we could answer a specific research question. This gave me an interesting idea: since 2014, I have been keeping track of all the songs that I have listened to, using a website called Last FM. Every time a track is played, on media players like Spotify or iTunes for example, a “scrobble” is recorded. This way, I have scrobbled a total of 121,587 tracks (and counting!). What better corpus to choose than a corpus that contains a large part of all the music you have ever listened to? Although at the start of this course I had never worked with an API, and had just learned intermediate skills in R in my 3d year of Psychology, it sounded like an interesting challenge. So, I started googling.

Very soon after I found out that this might become a daunting task. Collecting all my scrobbles from the Last FM API wasn’t the hard part; combining over 100,000 songs with Spotify features however, that was something I was not capable of. Luckily, I found a guide written by Andrew Walker, a researcher from the University of Florida, that included detailed instructions on how to do exactly this. Fetching the features would take the longest of the code, he said, likely up to 10-15 minutes. Obviously, for a dataset as large as mine, that was a gross underestimation. When I got the code working, I cut up the fetching process into two parts, my dataset into 5 parts, and let it all run sequentially. 6 hours of long waiting later, it was finally there: all my scrobbles and corresponding Spotify features! From this point on I knew that analyzing my corpus could lead to some very interesting results.

In this portfolio, I will try to answer one main research question: How does time influence my music listening? To answer this question, I will look at three different modes of time: 1. Hour of the day 2. Month of the year 3. Year of my life (also known as age, perhaps)

First, I will provide a visual overview of my data. Here you can find for each year all sorts of interesting descriptive statistics: how much music I’ve listened to, how my Spotify features have developed over time, and more. The most important explanatory variable being of course, time.

Then, I will conduct more detailed analyses. Certain interesting patterns emerged from my preliminary analyses, how can they be explained? Can I find more information about them in chordograms, keygrams, self-similarity matrices?

2015

2016

2017

2018

2019

All Years

Column {data-width = 500}

Column {data-height=600 data-width = 250}

Column

Chordograms and keygrams over time

Histogram of they of songs listened in 2015 and 2019


In my portfolio, I want to see how my music listening has changed over the years. To analyse this, I have a corpus that consists of the songs I have listened to since 2014. One thing that might have changed is the key of the songs that I have listened to. To analyse this, I made a histogram of all the keys of the songs in 2015, my final year of high school, and 2019, when I was halfway in my second year of Psychology.

In the histogram, you can see for every key what its proportion is to all of the keys of the songs that I listened to in 2015 and 2019. It seems that D is the most popular, and D# the least. There are slight differences in key between the years, but most noticably, it seems I am listening to far fewer songs in the key of A. Why is this?

To figure this out, I made a table of the artists I listened to in each year that wrote songs in A, and looked at the artists with the highest frequency. Not to my surprise, most songs in A were written by artist like Mac DeMarco, Beach House, Grizzly Bear, and The Black Keys, which are all alternative/ indie artists using guitars. The A chord is popular in songs written on guitar, since it can be played as an open chord, and it goes well with many other open chords. Since 2015, I have started listening to a lot less guitar-centered music, which might explain why I am also listening to fewer songs in the key of A.

Tempo mean and standard deviation


For this plot, I used my top 15 songs from 2015 and 2019. Since I had to use the data I fetched from LastFM, I had a hard time getting the data right so it would work with the compmus package. I managed to get it right however, and the resulting plot is quite interesting.

Here you can see the mean tempo plotted against the SD of tempo, colour indicating tempo, size indicating song duration, and opacity indicating loudness. There seem to be quite large differences between 2015 and 2019. First of all, the range of tempo is much larger in 2019: it spans from ~70 to ~160, while in 2015 the tempo is clustered around 100. This indicates that in 2019, my music taste has become more varied. This can also be seen in song duration and loudness: in 2015 they seem to be similar, while in 2019 it seems to vary more.

Interestingly, the standard deviation of tempo seems to increase with tempo. Does this mean that higher tempo songs also have a higher deviation in tempo? I have no clue.

Spotify features

Spotify features

Tempo